Learning and Feature Selection under Budget Constraints in Crowdsourcing
نویسندگان
چکیده
The cost of data acquisition limits the amount of labeled data available for machine learning algorithms, both at the training and the testing phase. This problem is further exacerbated in real-world crowdsourcing applications where labels are aggregated from multiple noisy answers. We tackle classification problems where the underlying feature labels are unknown to the algorithm and a (noisy) label of the desired feature can be acquired at a fixed cost. This problem has two types of budget constraints — the total cost of feature labels available for learning at the training phase, and the cost of features to use during the testing phase for classification. We propose a novel budgeted learning and feature selection algorithm, B-LEAFS, for jointly tackling this problem in the presence of noise. Experimental evaluation on synthetic and real-world crowdsourcing data demonstrate the practical applicability of our approach.
منابع مشابه
Perform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملCrowdsourcing Complex Workflows under Budget Constraints
We consider the problem of task allocation in crowdsourcing systems with multiple complex workflows, each of which consists of a set of inter-dependent micro-tasks. We propose Budgeteer, an algorithm to solve this problem under a budget constraint. In particular, our algorithm first calculates an efficient way to allocate budget to each workflow. It then determines the number of inter-dependent...
متن کاملModel Selection by Linear Programming
Budget constraints arise in many computer vision problems. Computational costs limit many automated recognition systems while crowdsourced systems are hindered by monetary costs. We leverage wide variability in image complexity and learn adaptive model selection policies. Our learnt policy maximizes performance under average budget constraints by selecting “cheap” models for low complexity inst...
متن کاملCheaper and Better: Selecting Good Workers for Crowdsourcing
Crowdsourcing provides a popular paradigm for data collection at scale. We study the problem of selecting subsets of workers from a given worker pool to maximize the accuracy under a budget constraint. One natural question is whether we should hire as many workers as the budget allows, or restrict on a small number of topquality workers. By theoretically analyzing the error rate of a typical se...
متن کاملStatistical Decision Making for Budget Allocation in Crowdsourcing
In this short paper, we briefly describe some recent progress on statistical decision making for budget allocation in crowdsourcing. We address the budget allocation problem for two important labeling tasks in crowdsourcing: the categorization labeling task and pairwise ranking aggregation. We also show the connections between our work and the “proactive learning” framework proposed by Jaime Ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016